# combining the two data sets
EPA_combined <- rbind(EPA2002, EPA2022, fill = TRUE)PM2.5 Trends in California
Project Description
I will work with air pollution data from the U.S. Environmental Protection Agency (EPA). The EPA has a national monitoring network of air pollution sites that The primary question I will answer is whether daily concentrations of PM\(_{2.5}\) (particulate matter air pollution with aerodynamic diameter less than 2.5 \(\mu\)m) have decreased in California over the last 20 years (from 2002 to 2022).
Exploraotry Data Analysis
Summary of 2002 Findings
The 2002 data set consists of 15,976 rows and 22 columns (variables), with no apparent missing data in the headers or footers. Initial checks of the data structure indicated a mix of character, integer, and numeric data types. The variable names include Date, Source, Site ID, POC, Daily Mean PM\(_{2.5}\) Concentration, Units, Daily AQI Value, Local Site Name, Daily Obs Count, Percent Complete, AQS Parameter Code, Parameter Description, Method Code, Method Description, CBSA Code, CBSA Name, State FIPS Code, State, County FIPS Code, County, Site Latitude, and Site Longitude. The character variables of interest are date, state, and county. While te numerical variables under study are daily mean PM2.5 concentration, site latitude, and site longitude.
Upon examining the data, the majority of the Daily Mean PM\(_{2.5}\) Concentration values range between 0 and 104.3 µg/m³, with a mean of 16.12 µg/m³, a median of 12 µg/m³, and a maximum value of 185 µg/m³. There are no missing values in the Daily Mean PM\(_{2.5}\) Concentration column, ensuring the key variable of interest is complete for analysis. Nonetheless, while the data set was mostly complete, the presence of missing values requires further investigation to ensure data quality. A closer examination of missing data patterns and potential outliers, particularly in the PM\(_{2.5}\) measurements, is necessary to identify any inconsistencies.
Summary of 2022 Findings
The 2022 data set contains 59,756 rows and 22 columns (variables), with the headers and footers loaded correctly. There is evidence of missing data, though not the main variable of interest, Daily Mean PM\(_{2.5}\) Concentration. The variable names and types remain consistent with the 2002 data set. Observations show that the majority of the Daily Mean PM\(_{2.5}\) Concentration values range from -6.7 to 302.5 µg/m³, with a mean of 8.43 µg/m³, a median of 6.8 µg/m³, and a maximum of 302.5 µg/m³. However, it is worth noting that it is unusual for PM\(_{2.5}\) concentrations to have negative values, as particulate matter is a physical measurement of pollution in the air. A negative value might indicate an issue with the data collection, sensor calibration, or data processing.
Data Analysis
- Combine the two years of data into one data frame. Use the Date variable to create a new column for year, which will serve as an identifier. Change the names of the key variables so that they are easier to refer to in your code.
# converting date to date format
EPA_combined$Date <- as.Date(EPA_combined$Date, format = "%m/%d/%Y")
# creating a 'Year' column from the date
EPA_combined$Year <- format(EPA_combined$Date, "%Y")# renaming the columns of key variables
setnames(EPA_combined, old = c("Daily Mean PM2.5 Concentration", "Daily AQI Value",
"Site ID", "Site Latitude", "Site Longitude"),
new = c("PM2.5", "AQI", "Site_ID", "Latitude", "Longitude"))# checking the new data set
summary(EPA_combined) Date Source Site_ID POC
Min. :2002-01-01 Length:75732 Min. :60010007 Min. : 1.000
1st Qu.:2022-01-19 Class :character 1st Qu.:60290016 1st Qu.: 1.000
Median :2022-05-14 Mode :character Median :60612003 Median : 3.000
Mean :2018-04-13 Mean :60560422 Mean : 3.309
3rd Qu.:2022-09-09 3rd Qu.:60731022 3rd Qu.: 3.000
Max. :2022-12-31 Max. :61131003 Max. :24.000
PM2.5 Units AQI Local Site Name
Min. : -6.70 Length:75732 Min. : 0.0 Length:75732
1st Qu.: 4.50 Class :character 1st Qu.: 25.0 Class :character
Median : 7.60 Mode :character Median : 42.0 Mode :character
Mean : 10.05 Mean : 43.5
3rd Qu.: 12.20 3rd Qu.: 57.0
Max. :302.50 Max. :454.0
Daily Obs Count Percent Complete AQS Parameter Code AQS Parameter Description
Min. :1 Min. :100 Min. :88101 Length:75732
1st Qu.:1 1st Qu.:100 1st Qu.:88101 Class :character
Median :1 Median :100 Median :88101 Mode :character
Mean :1 Mean :100 Mean :88197
3rd Qu.:1 3rd Qu.:100 3rd Qu.:88101
Max. :1 Max. :100 Max. :88502
Method Code Method Description CBSA Code CBSA Name
Min. :117.0 Length:75732 Min. :12540 Length:75732
1st Qu.:170.0 Class :character 1st Qu.:31080 Class :character
Median :170.0 Mode :character Median :40140 Mode :character
Mean :327.8 Mean :34595
3rd Qu.:707.0 3rd Qu.:41740
Max. :810.0 Max. :49700
NA's :5496
State FIPS Code State County FIPS Code County
Min. :6 Length:75732 Min. : 1.00 Length:75732
1st Qu.:6 Class :character 1st Qu.: 29.00 Class :character
Median :6 Mode :character Median : 61.00 Mode :character
Mean :6 Mean : 55.89
3rd Qu.:6 3rd Qu.: 73.00
Max. :6 Max. :113.00
Latitude Longitude Year
Min. :32.58 Min. :-124.2 Length:75732
1st Qu.:34.07 1st Qu.:-121.4 Class :character
Median :36.48 Median :-119.3 Mode :character
Mean :36.19 Mean :-119.5
3rd Qu.:37.96 3rd Qu.:-117.9
Max. :41.76 Max. :-115.5
head(EPA_combined) Date Source Site_ID POC PM2.5 Units AQI Local Site Name
<Date> <char> <int> <int> <num> <char> <int> <char>
1: 2002-01-05 AQS 60010007 1 25.1 ug/m3 LC 81 Livermore
2: 2002-01-06 AQS 60010007 1 31.6 ug/m3 LC 93 Livermore
3: 2002-01-08 AQS 60010007 1 21.4 ug/m3 LC 74 Livermore
4: 2002-01-11 AQS 60010007 1 25.9 ug/m3 LC 82 Livermore
5: 2002-01-14 AQS 60010007 1 34.5 ug/m3 LC 98 Livermore
6: 2002-01-17 AQS 60010007 1 41.0 ug/m3 LC 115 Livermore
Daily Obs Count Percent Complete AQS Parameter Code
<int> <num> <int>
1: 1 100 88101
2: 1 100 88101
3: 1 100 88101
4: 1 100 88101
5: 1 100 88101
6: 1 100 88101
AQS Parameter Description Method Code Method Description
<char> <int> <char>
1: PM2.5 - Local Conditions 120 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS
2: PM2.5 - Local Conditions 120 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS
3: PM2.5 - Local Conditions 120 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS
4: PM2.5 - Local Conditions 120 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS
5: PM2.5 - Local Conditions 120 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS
6: PM2.5 - Local Conditions 120 Andersen RAAS2.5-300 PM2.5 SEQ w/WINS
CBSA Code CBSA Name State FIPS Code State
<int> <char> <int> <char>
1: 41860 San Francisco-Oakland-Hayward, CA 6 California
2: 41860 San Francisco-Oakland-Hayward, CA 6 California
3: 41860 San Francisco-Oakland-Hayward, CA 6 California
4: 41860 San Francisco-Oakland-Hayward, CA 6 California
5: 41860 San Francisco-Oakland-Hayward, CA 6 California
6: 41860 San Francisco-Oakland-Hayward, CA 6 California
County FIPS Code County Latitude Longitude Year
<int> <char> <num> <num> <char>
1: 1 Alameda 37.68753 -121.7842 2002
2: 1 Alameda 37.68753 -121.7842 2002
3: 1 Alameda 37.68753 -121.7842 2002
4: 1 Alameda 37.68753 -121.7842 2002
5: 1 Alameda 37.68753 -121.7842 2002
6: 1 Alameda 37.68753 -121.7842 2002
tail(EPA_combined) Date Source Site_ID POC PM2.5 Units AQI Local Site Name
<Date> <char> <int> <int> <num> <char> <int> <char>
1: 2022-12-01 AQS 61131003 1 3.4 ug/m3 LC 19 Woodland-Gibson Road
2: 2022-12-07 AQS 61131003 1 3.8 ug/m3 LC 21 Woodland-Gibson Road
3: 2022-12-13 AQS 61131003 1 6.0 ug/m3 LC 33 Woodland-Gibson Road
4: 2022-12-19 AQS 61131003 1 34.8 ug/m3 LC 99 Woodland-Gibson Road
5: 2022-12-25 AQS 61131003 1 23.2 ug/m3 LC 77 Woodland-Gibson Road
6: 2022-12-31 AQS 61131003 1 1.0 ug/m3 LC 6 Woodland-Gibson Road
Daily Obs Count Percent Complete AQS Parameter Code
<int> <num> <int>
1: 1 100 88101
2: 1 100 88101
3: 1 100 88101
4: 1 100 88101
5: 1 100 88101
6: 1 100 88101
AQS Parameter Description Method Code
<char> <int>
1: PM2.5 - Local Conditions 145
2: PM2.5 - Local Conditions 145
3: PM2.5 - Local Conditions 145
4: PM2.5 - Local Conditions 145
5: PM2.5 - Local Conditions 145
6: PM2.5 - Local Conditions 145
Method Description CBSA Code
<char> <int>
1: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
2: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
3: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
4: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
5: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
6: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
CBSA Name State FIPS Code State
<char> <int> <char>
1: Sacramento--Roseville--Arden-Arcade, CA 6 California
2: Sacramento--Roseville--Arden-Arcade, CA 6 California
3: Sacramento--Roseville--Arden-Arcade, CA 6 California
4: Sacramento--Roseville--Arden-Arcade, CA 6 California
5: Sacramento--Roseville--Arden-Arcade, CA 6 California
6: Sacramento--Roseville--Arden-Arcade, CA 6 California
County FIPS Code County Latitude Longitude Year
<int> <char> <num> <num> <char>
1: 113 Yolo 38.66121 -121.7327 2022
2: 113 Yolo 38.66121 -121.7327 2022
3: 113 Yolo 38.66121 -121.7327 2022
4: 113 Yolo 38.66121 -121.7327 2022
5: 113 Yolo 38.66121 -121.7327 2022
6: 113 Yolo 38.66121 -121.7327 2022
- Create a basic map in leaflet() that shows the locations of the sites (make sure to use different colors for each year). Summarize the spatial distribution of the monitoring sites.
# ensuring the data is in the correct format
EPA_combined$Year <- as.numeric(EPA_combined$Year)
# defining a color palette for the years (2002 and 2022)
palette <- colorFactor(palette = c("turquoise", "pink"), domain = EPA_combined$Year)
# creating the leaflet map
leaflet(EPA_combined) %>%
addTiles() %>%
addCircleMarkers(
~Longitude, ~Latitude, # Set the longitude and latitude
color = ~palette(Year), # Use different colors for each year
popup = ~paste("Site ID:", Site_ID, "<br>",
"Year:", Year, "<br>",
"PM2.5:", PM2.5, "<br>",
"AQI:", AQI), # Popup information
radius = 5, fillOpacity = 0.8, stroke = FALSE
) %>%
addLegend(
"bottomright",
pal = palette,
values = ~Year,
title = "Monitoring Year",
opacity = 1
)Summary of the spatial distribution of the monitoring sites
In 2002, monitoring sites were mainly concentrated around major cities like Los Angeles, San Francisco, and Sacramento, with less coverage in central and eastern regions. By 2022, the number of monitoring sites increased, especially in previously underrepresented areas, indicating an expansion of air quality monitoring infrastructure over the two decades.
- Check for any missing or implausible values of PM\(_{2.5}\) in the combined dataset. Explore the proportions of each and provide a summary of any temporal patterns you see in these observations.
# checking for missing values in PM2.5
missing_PM25 <- EPA_combined[is.na(PM2.5), .N]
# checking for implausible values (e.g., negative values or values above 500 ug/m^3 (as given by the 2012 EPA)
implausible_PM25 <- EPA_combined[PM2.5 < 0 | PM2.5 > 500, .N]
# total number of observations
total_obs <- nrow(EPA_combined)
# calculating proportions of missing and implausible values
prop_missing <- missing_PM25 / total_obs
prop_implausible <- implausible_PM25 / total_obs
# summary of findings
cat("Total Observations:", total_obs, "\n")Total Observations: 75732
cat("Missing PM2.5 Values:", missing_PM25, "(", round(prop_missing * 100, 2), "% )\n")Missing PM2.5 Values: 0 ( 0 % )
cat("Implausible PM2.5 Values:", implausible_PM25, "(", round(prop_implausible * 100, 2), "% )\n")Implausible PM2.5 Values: 215 ( 0.28 % )
# exploring temporal patterns in missing and implausible values
missing_by_year <- EPA_combined[is.na(PM2.5), .N, by = Year]
implausible_by_year <- EPA_combined[PM2.5 < 0 | PM2.5 > 500, .N, by = Year]# displaying the missing and implausible values by year
missing_by_yearEmpty data.table (0 rows and 2 cols): Year,N
implausible_by_year Year N
<num> <int>
1: 2022 215
# examining frequency of implausible values by month
implausible_values <- subset(EPA_combined, PM2.5 < 0 | PM2.5 > 500)
# extracting month from the Date column
implausible_values$Month <- format(as.Date(implausible_values$Date), "%Y-%m")
# creating a table or summary of the count of implausible values by month
implausible_by_month <- table(implausible_values$Month)
# converting to a data frame for easier plotting or viewing
implausible_by_month_df <- as.data.frame(implausible_by_month)
# view the distribution
print(implausible_by_month_df) Var1 Freq
1 2022-01 23
2 2022-02 18
3 2022-03 8
4 2022-04 4
5 2022-05 12
6 2022-06 19
7 2022-07 27
8 2022-08 7
9 2022-09 21
10 2022-10 4
11 2022-11 26
12 2022-12 46
Summary of temporal patterns
The combined dataset has a total of 75,732 observations with no missing values for PM\(_{2.5}\), as shown by a missing proportion of 0%. However, there are 215 implausible values (0.28%), defined as PM\(_{2.5}\) concentrations less than 0 or greater than 500, as given by the 2012 EPA. Temporal analysis of these implausible values reveals that all implausible values occurred in 2022, with no such values found in 2002. Delving into the monthly frequencies in which PM\(_{2.5}\) implausible values were recorded, the values were distributed throughout the year, with the highest occurrences in December (46 values) and July (27 values), while April and October had the fewest (4 values each).
- Explore the main question of interest at three different spatial levels. Create exploratory plots (e.g. boxplots, histograms, line plots) and summary statistics that best suit each level of data. Be sure to write up explanations of what you observe in these data.
- State
# sub-setting for California data
california_data <- EPA_combined[State == "California"]
# summary statistics for PM2.5 in California across years
summary_stats_state <- california_data %>%
group_by(Year) %>%
summarize(
mean_PM2.5 = mean(PM2.5, na.rm = TRUE),
median_PM2.5 = median(PM2.5, na.rm = TRUE),
sd_PM2.5 = sd(PM2.5, na.rm = TRUE),
min_PM2.5 = min(PM2.5, na.rm = TRUE),
max_PM2.5 = max(PM2.5, na.rm = TRUE),
count = n()
)
# printing the summary statistics
print(summary_stats_state)# A tibble: 2 × 7
Year mean_PM2.5 median_PM2.5 sd_PM2.5 min_PM2.5 max_PM2.5 count
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 2002 16.1 12 13.9 0 104. 15976
2 2022 8.43 6.8 7.64 -6.7 302. 59756
# histogram of PM2.5 by year
ggplot(data = california_data) +
geom_histogram(aes(x = PM2.5, fill = as.factor(Year)),
position = "identity", alpha = 0.6, binwidth = 2) +
labs(title = "PM2.5 by Year in California", x = "Daily Mean PM2.5 Concentration (µg/m³)",
fill = "Year") +
theme_minimal()# boxplot of PM2.5 by year
ggplot(california_data, aes(x = as.factor(Year), y = PM2.5)) +
geom_boxplot(fill = "pink", color = "purple", alpha = 0.7) +
labs(title = "PM2.5 Concentrations by Year in California (2002-2022)",
x = "Year",
y = "Daily Mean PM2.5 Concentration (µg/m³)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))# creating a summary data.table for highlighted years
highlight_years <- california_data[Year %in% c(2002, 2022), .(Mean_PM2.5 = mean(PM2.5, na.rm = TRUE)), by = Year]
# creating the line plot
ggplot(california_data, aes(x = Year, y = PM2.5)) +
# Line plot for average PM2.5 using linewidth
geom_line(stat = "summary", fun = mean, color = "pink", linewidth = 1) +
# adding points for highlighted years
geom_point(data = highlight_years, aes(x = Year, y = Mean_PM2.5),
size = 3, color = "purple", fill = "purple", shape = 21) +
# adding circles around the points for emphasis
geom_point(data = highlight_years, aes(x = Year, y = Mean_PM2.5),
size = 6, color = "purple", shape = 1) +
labs(title = "Average PM2.5 Concentration Over Time in California (2002-2022)",
x = "Year",
y = "Average Daily Mean PM2.5 (µg/m³)") +
theme_minimal()Summary of observations
The data indicates a significant decrease in daily PM\(_{2.5}\) concentrations in California from 2002 to 2022. The mean concentration dropped from 16.12 μg/m³ to 8.43 μg/m³, showing nearly a 50% reduction. The spread of values also narrowed, suggesting fewer extreme pollution days. While 2022 still had occasional high pollution events, overall air quality improved markedly, with most days showing much lower PM2.5 levels compared to 2002. This trend reflects advancements in air quality management and pollution control measures over the past two decades.
- County
# summary statistics for PM2.5 by counties in California across years
summary_stats_county <- EPA_combined %>%
group_by(County, Year) %>%
summarize(
mean_PM2.5 = mean(PM2.5, na.rm = TRUE),
median_PM2.5 = median(PM2.5, na.rm = TRUE),
sd_PM2.5 = sd(PM2.5, na.rm = TRUE),
min_PM2.5 = min(PM2.5, na.rm = TRUE),
max_PM2.5 = max(PM2.5, na.rm = TRUE),
count = n(),
.groups = "drop" # Add this line to control grouping behavior
) %>%
arrange(County, Year)
# printing the summary statistics
print(summary_stats_county)# A tibble: 98 × 8
County Year mean_PM2.5 median_PM2.5 sd_PM2.5 min_PM2.5 max_PM2.5 count
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 Alameda 2002 14.3 10 11.4 1.9 61.6 201
2 Alameda 2022 8.20 7 4.95 -0.7 35.5 1793
3 Butte 2002 14.8 11.5 11.7 1 88 473
4 Butte 2022 6.19 4.5 5.79 -0.6 42.8 1121
5 Calaveras 2002 9.9 8 6.50 2 40 60
6 Calaveras 2022 6.04 5 4.10 0 25.9 355
7 Colusa 2002 11.7 9 10.0 1 57 95
8 Colusa 2022 7.61 6.7 4.76 0.6 37 401
9 Contra Costa 2002 15.1 9.5 14.5 2 76.7 276
10 Contra Costa 2022 8.25 7.3 4.92 0.9 37.3 817
# ℹ 88 more rows
# ensuring 'Year' is treated as a factor
EPA_combined$Year <- as.factor(EPA_combined$Year)
# creating a bar plot
ggplot(data = EPA_combined, aes(x = County, y = PM2.5, fill = Year)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("2002" = "turquoise", "2022" = "pink")) +
labs(title = "PM2.5 Trends by County (2002 vs 2022)",
x = "County",
y = "Mean Daily PM2_5 (µg/m³)",
fill = "Year") +
coord_flip()Summary of observations
From 2002 to 2022, air quality in California’s counties, measured by PM\(_{2.5}\) levels, generally improved significantly. For example, Alameda County saw a reduction in mean PM\(_{2.5}\) from 14.25 µg/m³ to 8.20 µg/m³, and Butte County’s mean decreased from 14.76 µg/m³ to 6.19 µg/m³. Similar downward trends were observed across counties, such as Fresno, where PM\(_{2.5}\) levels dropped from 19.93 µg/m³ to 10.19 µg/m³. The decrease in both mean and maximum PM\(_{2.5}\) values indicates improved air quality, although variability persisted in some areas with occasional spikes, such as Trinity and Placer counties. Overall, air quality across the state showed marked improvements, with fewer high pollution days over the 20-year period.
- Sites in Los Angeles
# sub-setting Los Angeles site data
la_data <- EPA_combined[County == "Los Angeles"]
# summary statistics for PM2.5 in Los Angeles sites across years
summary_stats_la <- la_data %>%
group_by(Year) %>%
summarize(
mean_PM2.5 = mean(PM2.5, na.rm = TRUE),
median_PM2.5 = median(PM2.5, na.rm = TRUE),
sd_PM2.5 = sd(PM2.5, na.rm = TRUE),
n = n()
)
# printing the summary statistics
print(summary_stats_la)# A tibble: 2 × 5
Year mean_PM2.5 median_PM2.5 sd_PM2.5 n
<fct> <dbl> <dbl> <dbl> <int>
1 2002 19.7 17.4 11.9 1879
2 2022 11.0 10.3 5.24 5070
# histogram of PM2.5 in Los Angeles sites
ggplot(la_data, aes(x = PM2.5, fill = Year)) +
geom_histogram(binwidth = 2, color = "pink", alpha = 0.7, position = "identity") +
labs(title = "Distribution of Daily Mean PM2.5 Concentrations at Los Angeles sites (2002-2022)",
x = "Daily Mean PM2.5 Concentration (µg/m³)",
y = "Frequency") +
scale_fill_manual(values = c("2002" = "turquoise", "2022" = "pink")) + # Custom colors for each year
theme_minimal() +
theme(legend.position = "top")# loading gridExtra library
library(gridExtra)
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
# Splitting data into 2002 and 2022 subsets
LA_2002 <- subset(la_data, Year == 2002)
LA_2022 <- subset(la_data, Year == 2022)
# Ensure correct handling of dates (adding the year manually)
LA_2002$Date <- as.Date(paste("2002", format(LA_2002$Date, "%m-%d"), sep = "-"))
LA_2022$Date <- as.Date(paste("2022", format(LA_2022$Date, "%m-%d"), sep = "-"))
# Check if dates are ordered correctly
LA_2002 <- LA_2002[order(LA_2002$Date), ]
LA_2022 <- LA_2022[order(LA_2022$Date), ]
# Plotting PM2.5 levels for 2002
plot_2002 <- ggplot(LA_2002, aes(x = Date, y = PM2.5)) +
geom_line(color = "turquoise") +
geom_point(color = "turquoise") +
scale_x_date(date_labels = "%b", date_breaks = "1 month") + # Set month labels
labs(title = "Change in PM2.5 in Los Angeles in 2002", x = "Month in 2002", y = "Daily Mean PM2.5 Concentration (µg/m³)") +
theme_minimal()
# Plotting PM2.5 levels for 2022
plot_2022 <- ggplot(LA_2022, aes(x = Date, y = PM2.5)) +
geom_line(color = "pink") +
geom_point(color = "pink") +
scale_x_date(date_labels = "%b", date_breaks = "1 month") + # Set month labels
labs(title = "Change in PM2.5 in Los Angeles in 2022", x = "Month in 2022", y = "Daily Mean PM2.5 Concentration (µg/m³)") +
theme_minimal()
# Arrange both plots side-by-side
grid.arrange(plot_2002, plot_2022, ncol = 2)Summary of Observations
In Los Angeles County, the air quality significantly improved from 2002 to 2022, as indicated by a decrease in PM\(_{2.5}\) levels. In 2002, the mean PM\(_{2.5}\) was 19.66 µg/m³, with a median of 17.4 µg/m³, and a standard deviation of 11.88 µg/m³, based on 1,879 observations. By 2022, the mean PM\(_{2.5}\) had dropped to 10.97 µg/m³, with a median of 10.3 µg/m³ and a standard deviation of 5.24 µg/m³, based on a larger dataset of 5,070 observations.